Data Visualisation in Python
This is a HTML document. The Introduction to Python course is written and intended to be used in a Jupyter Notebook file. These HTML documents have been made available for users who require screen readers or other accessibility needs. These HTML documents have been tested, but if you notice any errors or any compatibility issues please contact us on the GSS Capability email inbox.
If you are using a screen reader you will need to set your punctuation level (sometimes called verbosity) to full, especially for the code sections.
Chapter 3 – Creating a Variety of Plot Types
Chapter Overview
Packages and Data
Continuous Data
- Histogram
Discrete
- Bar (Frequency)
Continuous X, Continuous Y *Scatter Plots Revisited
Continuous Functions
- Line Plots
Discrete X, Continuous Y
- Bar Charts Revisited
- Box Plots
1 Packages and Data
Let’s start, as always by loading our packages and our data.
We’re using:
- Numpy – Version 1.12.1
- Pandas – Version 0.20.1
- Matplotlib – Version 2.0.2
- Seaborn - Version 0.7.1
Remember you can use the .__version__ attribute (e.g np.__version__ )to check your version.
More information about the packages is given in Chapter 1.
We’re following standard convention for nicknames, and we’ll also load the gapminder data.
# Load packages
import pandas as pd
import matplotlib
import matplotlib.pyplot as plt
import numpy as np
import seaborn as sns
from matplotlib import rcParams # To set our defaults
# Load data
gapminder = pd.read_csv("../data/gapminder.csv")We’ll also use the magic command
% matplotlib inline
This means any plot we create will be automatically embedded below the code cell once the code has been executed.
#% matplotlib inlineWhen we load Seaborn it uses its style as the default one; overriding Matplotlib.
Here I’m setting the default style to ticks which is the most similar to our Matplotlib default.
sns.set_style("ticks")Here we’ll also bring in those default values we talked about in the last chapter.
# Set Default Fonts
rcParams["font.family"] = "sans-serif"
rcParams["font.sans-serif"] = ["Arial", "Tahoma"]
# Set Default font sizes
small_size = 12
medium_size = 14
bigger_size = 16
# Change the font size for individual elements
matplotlib.rc("font", size=small_size) # controls default text sizes
matplotlib.rc("axes", titlesize=small_size) # fontsize of the axes title
matplotlib.rc("axes", labelsize=medium_size) # fontsize of the x and y labels
matplotlib.rc("xtick", labelsize=small_size) # fontsize of the tick labels
matplotlib.rc("ytick", labelsize=small_size) # fontsize of the tick labels
matplotlib.rc("legend", fontsize=small_size) # legend fontsize
matplotlib.rc("axes", titlesize=medium_size) # title fontsizeAs a reminder visualisation code can get lengthy quickly.
A lot of these visualisations will only have one or two new concepts, the other code will be things covered previously.
To make the code clearer we will be using lots of comments. In Python these look like this:
# This is a commentAs we get into more complicated visualisations new concepts will have the word NEW - at the start of the comment e.g:
# NEW - Set X AxisWe will also refer to line numbers to describe content.
If you are using Jupyter Notebook, you can turn on line numbers in the View Menu -> Toggle Line Numbers
2 Continuous Data
2.1 Histogram
We create histograms in Matplotlib using axes.hist().
Histograms take one argument, the X axis value.
In a histogram each column represents a group of a continuous, quantitative variable, people often confuse these with bar charts; where each column represents a group defined by a categorical variable.
# Create our figure and our axes
figure, axes = plt.subplots(figsize=(6, 5))
# Plot the Data
axes.hist(x = gapminder["life_exp"])
# Add Labels, Title and Captionsplt.suptitle("Frequency of Life Expectancy at Birth", horizontalalignment = "right")
plt.title("Data from Gapminder Dataset" ,
fontname="Arial", size=12, x = 0.02)
figure.text(x=0.65, y=-0.03, s="Source: Gapminder", ha="left")
axes.set_xlabel("Life expectancy at birth in years ")
axes.set_ylabel("Count")
# Set xlim - the lowest x value - to 0
axes.set_xlim(0)
# Set Gridlines and coloursaxes.grid(b = True , which = "both", axis = "y", color = (0.745, 0.745, 0.745))axes.set_frame_on(False)
axes.set_axisbelow(True)
# Set Tick Colours to the same grey as our gridlines
axes.xaxis.set_tick_params(color=(0.745,0.745,0.745))
axes.yaxis.set_tick_params(color=(0.745,0.745,0.745))
plt.show();2.1.1 Changing bin sizes
The default number of groups, called bins is defined by rcParam, and in my version is 10.
To get finer control we can change the bins = parameter inside our axes.hist(). Here I’m setting my number of bins to 40.
# Create our figure and our axes
figure, axes = plt.subplots(figsize=(6, 5))
# Plot the Data
axes.hist(gapminder["life_exp"],
bins = 40) # NEW - Change the bin size
# Add Labels, Title and Captionsplt.suptitle("Frequency of Life Expectancy at Birth", horizontalalignment = "right")
plt.title("Data from Gapminder Dataset" ,
fontname="Arial", size=12, x = 0.02)
figure.text(x=0.65, y=-0.02, s="Source: Gapminder", ha="left")
axes.set_xlabel("Life expectancy at birth in years ")
axes.set_ylabel("Count")
# Set Gridlines and colours
axes.grid(b = True , which = "both", axis = "y", color = (0.745, 0.745, 0.745))
axes.set_frame_on(False)
axes.set_axisbelow(True)
# Set xlim to 0
axes.set_xlim(0)
# Set Tick Colours to the same grey as our gridlinesaxes.xaxis.set_tick_params(color=(0.745,0.745,0.745))
axes.yaxis.set_tick_params(color=(0.745,0.745,0.745))
plt.show();We can of course use range() or np.arrange() to set our bins programmatically. More information on the np.arrange() function can be found in the numpy documentation. In the example below 10 bins (“groups”) are being created that span between 0 and the maximum value in life expectancy + 1.
You might find it helpful to run the code below to view the bin ranges before applying them to the plot.
np.arange(start = 0, stop = (gapminder[“life_exp”].max() + 1), step = 10) )
# Create our figure and our axes
figure, axes = plt.subplots(figsize=(6, 5))
# Plot the Data
axes.hist(gapminder["life_exp"],
bins = np.arange(start = 0,
stop = (gapminder["life_exp"].max() + 1),
step = 10) ) # NEW - Change the bin size
# Add Labels, Title and Captionsplt.suptitle("Frequency of Life Expectancy at Birth", horizontalalignment = "right")
plt.title("Data from Gapminder Dataset" ,
fontname="Arial", size=12, x = 0.02)
figure.text(x=0.65, y=-0.02, s="Source: Gapminder", ha="left")
axes.set_xlabel("Life expectancy at birth in years ")
axes.set_ylabel("Count")
# Set Gridlines and colours
axes.grid(b = True , which = "both", axis = "y", color = (0.745, 0.745, 0.745))
axes.set_frame_on(False)
axes.set_axisbelow(True)
# Set xlim to 0
axes.set_xlim(0)
# Set Tick Colours to the same grey as our gridlinesaxes.xaxis.set_tick_params(color=(0.745,0.745,0.745))
axes.yaxis.set_tick_params(color=(0.745,0.745,0.745))
plt.show();Have an experiment at changing the number of bins yourself.
We can create a horizontal histogram by setting the orientation = “horizontal” in our axes.hist().
# Create our figure and our axes
figure, axes = plt.subplots(figsize=(6, 5))
# Plot the Data
axes.hist(gapminder["life_exp"],
bins = 15, # Change the bin size
orientation = "horizontal")
# Add Labels, Title and Captionsplt.suptitle("Frequency of Life Expectancy at Birth", horizontalalignment = "right")
plt.title("Data from Gapminder Dataset" ,
fontname="Arial", size=12, x = 0.02)
figure.text(x=0.65, y=-0.02, s="Source: Gapminder", ha="left")
axes.set_xlabel("Life expectancy at birth in years ")
axes.set_ylabel("Count")
# Set Gridlines and colours
axes.grid(b = True , which = "both", axis = "x", color = (0.745, 0.745, 0.745))
axes.set_frame_on(False)
axes.set_axisbelow(True)
# Set Tick Colours to the same grey as our gridlines
axes.xaxis.set_tick_params(color=(0.745,0.745,0.745))
axes.yaxis.set_tick_params(color=(0.745,0.745,0.745))
plt.show();2.1.2 Colours
We can use the same colour formats here as we explored in Chapter 2.
We won’t be going over all the details again in these subsequent sections; but will just be showing you an example or two with different methods.
# Create our figure and our axes
figure, axes = plt.subplots(figsize=(6, 5))
# Plot the Data
axes.hist(gapminder["life_exp"],
bins = 15, # Change the bin size
color = "orange")
# Add Labels, Title and Captionsplt.suptitle("Frequency of Life Expectancy at Birth", horizontalalignment = "right")
plt.title("Data from Gapminder Dataset" ,
fontname="Arial", size=12, x = 0.02)
figure.text(x=0.65, y=-0.02, s="Source: Gapminder", ha="left")
axes.set_xlabel("Life expectancy at birth in years ")
axes.set_ylabel("Count")
# Set Gridlines and colours
axes.grid(b = True , which = "both", axis = "y", color = (0.745, 0.745, 0.745))
axes.set_frame_on(False)
axes.set_axisbelow(True)
# Set Tick Colours to the same grey as our gridlines
axes.xaxis.set_tick_params(color=(0.745,0.745,0.745))
axes.yaxis.set_tick_params(color=(0.745,0.745,0.745))
plt.show();We can also set the edge colour for the histogram by using edgecolor = .
# Create our figure and our axes
figure, axes = plt.subplots(figsize=(6, 5))
# Plot the Data
axes.hist(gapminder["life_exp"],
bins = 15, # Change the bin size
color = "orange", # Set the colour of the bar
edgecolor = "black") # NEW - Set the colour of outside of each bar
# Add Labels, Title and Captionsplt.suptitle("Frequency of Life Expectancy at Birth", horizontalalignment = "right")
plt.title("Data from Gapminder Dataset" ,
fontname="Arial", size=12, x = 0.02)
figure.text(x=0.65, y=-0.02, s="Source: Gapminder", ha="left")
axes.set_xlabel("Life expectancy at birth in years ")
axes.set_ylabel("Count")
# Set Gridlines and colours
axes.grid(b = True , which = "both", axis = "y", color = (0.745, 0.745, 0.745))
axes.set_frame_on(False)
axes.set_axisbelow(True)
# Set Tick Colours to the same grey as our gridlines
axes.xaxis.set_tick_params(color=(0.745,0.745,0.745))
axes.yaxis.set_tick_params(color=(0.745,0.745,0.745))
plt.show();3 Discrete Data
3.1 Bar (Frequency) Chart
In this section we’ll look at bar charts as they relate to frequency of the data. Later, we’ll look at bar charts with a continuous y and discrete x.
Firstly I’m going to create a new column; if the value in gdp_per_cap is greater than or equal to the mean of the gdp_per_cap column the new column (above_mean_gdp) will have a 1, if not it will have a 0.
I’ll then filter that for 1992; and use a group_by to find the number of countries above and below the mean.
# Add in a new column - 1 if greater than or equal to mean GDP of the total data. 0 if less than total mean GDP of data.
gapminder["above_mean_gdp"] = (gapminder["gdp_per_cap"] >= gapminder["gdp_per_cap"].mean()).astype("int64")
# Create a new dataset just for 1992
gapminder_1992 = gapminder[gapminder["year"] == 1992]
# Group by this new column and do a count.
mean_gdp_92 = gapminder_1992.groupby("above_mean_gdp")["country"].count()
# Look at the Data
mean_gdp_92above_mean_gdp
0 90
1 52
Name: country, dtype: int64
I can then plot this data using axes.bar():
First we specify the data we want to plot.
In earlier versions of Matplotlib this takes the parameter left = , this was changed in more modern versions to be x =.
For compatibility issues we’ve not included the parameter, but we highly advise you find out which parameter your version uses and apply that, parameters can make your code easier to read.
The height= parameter gives the height of our bars. The tick_label = parameter takes the values specified and applies them as tick labels.
As the groupby() gave us the values of 0 and 1 as the index we set our data (e.g left = or x=) to mean_gdp_92.index, and our height is just mean_gdp_92, as we don’t have any other columns.
# Create our figure and our axes
figure, axes = plt.subplots(figsize=(6, 5))
# Plot the Data
axes.bar(mean_gdp_92.index, # The X axis is the index values
height = mean_gdp_92,
tick_label = ["Below Mean", "Above Mean"])
# Add Labels, Title and Captionsplt.suptitle("Countries with below and above mean GDP", x = 0.3)
plt.title("1992 Data from Gapminder Dataset" ,
fontname="Arial", size=12, x = 0.02)
figure.text(x=0.65, y=-0.02, s="Source: Gapminder", ha="left")
axes.set_xlabel("Countries")
axes.set_ylabel("Number of Countries")
# Set Gridlines and colours
axes.grid(b = True , which = "both", axis = "y", color = (0.745, 0.745, 0.745))axes.set_frame_on(False)
axes.set_axisbelow(True)
# Set Tick Colours to the same grey as our gridlines
axes.xaxis.set_tick_params(color=(0.745,0.745,0.745))
axes.yaxis.set_tick_params(color=(0.745,0.745,0.745))
plt.show();3.1.1 Plotting Categorical Variables
Below I’m preparing my data. I’m using a groupby to return the number of countries in each continent. I’m using the parameter here
as_index = FalseIf we don’t set this parameter our continents will become our index.
Usually this is fine – however when plotting bar charts Matplotlib won’t let us use text columns for our X axis; this includes an index that is text.
By having a numerical index we can create our bar chart and then use column with the continents to display our tick labels.
number_countries = gapminder_1992.groupby(by = "continent", as_index= False)["country"].count()
number_countries.rename(columns = {"country": "num_countries"}, inplace= True) # Change the column name to be more descriptive
number_countries continent num_countries
0 Africa 52
1 Americas 25
2 Asia 33
3 Europe 30
4 Oceania 2
As mentioned earlier left = only accepts scalar values. Here we’re using the index of the DataFrame.
# Create our figure and our axes
figure, axes = plt.subplots(figsize=(6, 5))
# Plot the Data
axes.bar(number_countries.index, # The X axis is the index values as these are numbers
height = number_countries["num_countries"]);
# Add Labels, Title and Captionsplt.suptitle("Number of Countries In Each Continent", x = 0.25)
plt.title("Data from Gapminder Dataset" ,
fontname="Arial", size=12, x = 0.02)
figure.text(x=0.65, y=-0.02, s="Source: Gapminder", ha="left")
axes.set_xlabel("Continent")
axes.set_ylabel("Count")
# Set Gridlines and colours
axes.grid(b = True , which = "both", axis = "y", color = (0.745, 0.745, 0.745))axes.set_frame_on(False)
axes.set_axisbelow(True)
# Set Tick Colours to the same grey as our gridlines
axes.xaxis.set_tick_params(color=(0.745,0.745,0.745))
axes.yaxis.set_tick_params(color=(0.745,0.745,0.745))
plt.show();We can add the names of the continent back by using the tick_label = parameter; and setting this to the continent column.
# Create our figure and our axes
figure, axes = plt.subplots(figsize=(6, 5))
# Plot the Data
axes.bar(number_countries.index, # The X axis is the index values
height = number_countries["num_countries"],
tick_label = number_countries["continent"]); # NEW -Set the tick labels to be the continent names, not just numbers.
# Add Labels, Title and Captionsplt.suptitle("Number of Countries In Each Continent", x = 0.25)
plt.title("Data from Gapminder Dataset" ,
fontname="Arial", size=12, x = 0.02)
figure.text(x=0.65, y=-0.02, s="Source: Gapminder", ha="left")
axes.set_xlabel("Continent")
axes.set_ylabel("Count")
# Set Gridlines and colours
axes.grid(b = True , which = "both", axis = "y", color = (0.745, 0.745, 0.745))
axes.set_frame_on(False)
axes.set_axisbelow(True)
# Set Tick Colours to the same grey as our gridlines
axes.xaxis.set_tick_params(color=(0.745,0.745,0.745))
axes.yaxis.set_tick_params(color=(0.745,0.745,0.745))
plt.show();3.1.1.1 Seaborn
Seaborn makes it much easier to create this bar chart.
To match the other visualisation I need to sort my gapminder data by continent - the default order for bars is not alphabetical.
The x axis is set to be the continent column, the y axis to be the life expectancy column (it will automatically plot the mean of the column) and my data as the sorted dataframe gapminder_sort_continent.
By default the sns.barplot() has confidence interval bars on; we turn those off by setting ci = False
estimator = is by default the mean; but a different argument here can be passed, such as the count (using np.count_nonzero) as done here.
bar_plot = sns.barplot(x="continent", y = "life_exp" , data=gapminder_1992,
ci = None, # Removes confidence interval bars
estimator = np.count_nonzero); # Uses the count - mean is default
# Removes the "spines" or edges of our vissns.despine(left=True, bottom=True)
# Add Labels, Title and Captions (this comes from Matplotlib!)
plt.suptitle("Number of Countries In Each Continent", x = 0.30, fontname="Arial", size=16,)
plt.title("Data from Gapminder Dataset" ,
fontname="Arial", size=12, x = 0.047)
bar_plot.text(x=3, y= -15, s="Source: Gapminder", ha="left")
bar_plot.set_xlabel("Continent")
bar_plot.set_ylabel("Count")
# Set Gridlines and colours
bar_plot.grid(b = True , which = "both", axis = "y", color = (0.745, 0.745, 0.745))
# Set Tick Colours to the same grey as our gridlinesbar_plot.xaxis.set_tick_params(color=(0.745,0.745,0.745))
bar_plot.yaxis.set_tick_params(color=(0.745,0.745,0.745))
plt.show();3.1.2 Ordered Bar Charts
Good practice is often to sort our bar chart values in ascending or descending order. Which order depends on what we want to highlight – the smallest or the largest values.
Here I’m creating a new dataframe with the ascending values. Note that it’s really important to reset the index here – after all we are using the index to state what order we want our bars on the chart.
# Create a new dataframe - where the number of continents is sorted in ascending order.
asc_number_countries = number_countries.sort_values(by = "num_countries").reset_index()asc_number_countries.head() index continent num_countries
0 4 Oceania 2
1 1 Americas 25
2 3 Europe 30
3 2 Asia 33
4 0 Africa 52
# Create our figure and our axes
figure, axes = plt.subplots(figsize=(6, 5))
# Plot the Data
axes.bar(asc_number_countries.index, # The X axis is the index values
height = asc_number_countries["num_countries"],
tick_label = asc_number_countries["continent"]); # Set the tick labels to be the continent names, not just numbers.
# Add Labels, Title and Captionsplt.suptitle("Number of Countries In Each Continent", x = 0.25)
plt.title("Data from Gapminder Dataset" ,
fontname="Arial", size=12, x = 0.02)
figure.text(x=0.65, y=-0.02, s="Source: Gapminder", ha="left")
axes.set_xlabel("Continent")
axes.set_ylabel("Count")
# Set Gridlines and colours
axes.grid(b = True , which = "both", axis = "y", color = (0.745, 0.745, 0.745))axes.set_frame_on(False)
axes.set_axisbelow(True)
# Set Tick Colours to the same grey as our gridlines
axes.xaxis.set_tick_params(color=(0.745,0.745,0.745))
axes.yaxis.set_tick_params(color=(0.745,0.745,0.745))
plt.show();And in descending order we just set ascending = False.
# Create a new dataframe - where the number of continents is sorted in descending order.
desc_number_countries = number_countries.sort_values(by = "num_countries", ascending=False).reset_index()# Create our figure and our axes
figure, axes = plt.subplots(figsize=(6, 5))
# Plot the Data
axes.bar(desc_number_countries.index, # The X axis is the index values
height = desc_number_countries["num_countries"],
tick_label = desc_number_countries["continent"]); # Set the tick labels to be the continent names, not just numbers.
# Add Labels, Title and Captionsplt.suptitle("Number of Countries In Each Continent", x = 0.25)
plt.title("Data from Gapminder Dataset" ,
fontname="Arial", size=12, x = 0.02)
figure.text(x=0.65, y=-0.02, s="Source: Gapminder", ha="left")
axes.set_xlabel("Continent")
axes.set_ylabel("Count")
# Set Gridlines and colours
axes.grid(b = True , which = "both", axis = "y", color = (0.745, 0.745, 0.745))axes.set_frame_on(False)
axes.set_axisbelow(True)
# Set Tick Colours to the same grey as our gridlines
axes.xaxis.set_tick_params(color=(0.745,0.745,0.745))
axes.yaxis.set_tick_params(color=(0.745,0.745,0.745))
plt.show();For Seaborn, again using pre-sorted data is the simplest way to get your plot to look the way you desire.
I am using the asc_number_countries and the desc_number_countries DataFrames we created earlier.
Note that here Seaborn colours each bar a different colour; as our continents are a broad group our guidelines say they should be the same colour. We’ll cover how to do this in a later section.
# Ascending
bar_plot = sns.barplot(x="continent", y = "num_countries",
data=asc_number_countries,
ci=None, # Removes confidence interval bars
estimator=np.sum); # Uses the sum
# Removes the "spines" or edges of our vissns.despine(left=True, bottom=True)
# Add Labels, Title and Captions (this comes from Matplotlib!)
plt.suptitle("Number of Countries In Each Continent", x = 0.30, fontname="Arial", size=16,)
plt.title("Data from Gapminder Dataset" ,
fontname="Arial", size=12, x = 0.047)
bar_plot.text(x=3, y= -15, s="Source: Gapminder", ha="left")
bar_plot.set_xlabel("Continent")
bar_plot.set_ylabel("Count")
# Set Gridlines and colours
bar_plot.grid(b = True , which = "both", axis = "y", color = (0.745, 0.745, 0.745))
# Set Tick Colours to the same grey as our gridlinesbar_plot.xaxis.set_tick_params(color=(0.745,0.745,0.745))
bar_plot.yaxis.set_tick_params(color=(0.745,0.745,0.745))
plt.show();And in descending order = here I’ve set the color = parameter to be blue, which applies to all of the bars.
# Descending
bar_plot = sns.barplot(x="continent", y = "num_countries",
data=desc_number_countries,
ci=None, # Removes confidence interval bars
estimator=np.sum) # Uses the sum
# Removes the "spines" or edges of our vissns.despine(left=True, bottom=True)
# Add Labels, Title and Captions (this comes from Matplotlib!)
plt.suptitle("Number of Countries In Each Continent", x = 0.30, fontname="Arial", size=16,)
plt.title("Data from Gapminder Dataset" ,
fontname="Arial", size=12, x = 0.047)
bar_plot.text(x=3, y= -15, s="Source: Gapminder", ha="left")
bar_plot.set_xlabel("Continent")
bar_plot.set_ylabel("Count")
# Set Gridlines and colours
bar_plot.grid(b = True , which = "both", axis = "y", color = (0.745, 0.745, 0.745))
# Set Tick Colours to the same grey as our gridlinesbar_plot.xaxis.set_tick_params(color=(0.745,0.745,0.745))
bar_plot.yaxis.set_tick_params(color=(0.745,0.745,0.745))
plt.show();3.1.3 Horizonal Bar Charts
If tick labels are long a horizontal bar chart is often preferable to rotating tick labels.
A horizontal bar chart is created by using barh
Note the parameters are different than our axes.bar() above
Again, depending on your version of Matplotlib the parameter for the data may be different; older versions use bottom = and later versions use y =. We’ve omitted the parameter for compatibility, but recommend you find which is the correct version for you.
width is the values on our x axis; the mean life expectancy.
tick_label applies our names to the labels as before.
# Create our figure and our axes
figure, axes = plt.subplots(figsize=(6, 5))
# Plot the Data
axes.barh(number_countries.index, # The Y axis is the index values
width = number_countries["num_countries"],
tick_label = number_countries["continent"]); # Set the tick labels to be the continent names, not just numbers.
# Add Labels, Title and Captionsplt.suptitle("Number of Countries In Each Continent", x = 0.25)
plt.title("Data from Gapminder Dataset" ,
fontname="Arial", size=12, x = 0.02)
figure.text(x=0.65, y=-0.02, s="Source: Gapminder", ha="left")
axes.set_ylabel("Continent")
axes.set_xlabel("Count")
# Set Gridlines and colours
axes.grid(b = True , which = "both", axis = "x", color = (0.745, 0.745, 0.745))
axes.set_frame_on(False)
axes.set_axisbelow(True)
# Set Tick Colours to the same grey as our gridlines
axes.xaxis.set_tick_params(color=(0.745,0.745,0.745))
axes.yaxis.set_tick_params(color=(0.745,0.745,0.745))
plt.show();3.1.3.1 Seaborn
To create a horizontal bar chart in Seaborn you need to change the x and y axis around; and set the parameter orient = h.
bar_plot = sns.barplot(y="continent", x = "num_countries",
data=number_countries, # note x and y are reversed
ci=None, # Removes confidence interval bars
estimator=sum, # Uses the count - mean is default
orient="h") # Changes the orientation to horizontal
# Removes the "spines" or edges of our vissns.despine(left=True, bottom=True)
# Add Labels, Title and Captions (this comes from Matplotlib!)
plt.suptitle("Number of Countries In Each Continent", x = 0.30, fontname="Arial", size=16,)
plt.title("Data from Gapminder Dataset" ,
fontname="Arial", size=12, x = 0.047)
bar_plot.text(x=35, y= 5.5, s="Source: Gapminder", ha="left")
bar_plot.set_ylabel("Continent")
bar_plot.set_xlabel("Count")
# Set Gridlines and colours
bar_plot.grid(b = True , which = "both", axis = "x", color = (0.745, 0.745, 0.745))
# Set Tick Colours to the same grey as our gridlinesbar_plot.xaxis.set_tick_params(color=(0.745,0.745,0.745))
bar_plot.yaxis.set_tick_params(color=(0.745,0.745,0.745))
plt.show();3.1.4 Spacing Between Bars
In Matplotlib you can alter the spacing by adjusting the width of the bars using the width parameter.
This takes a scale from 0 (farthest apart) to 1 (touching). The default is 0.8
The example below is an extreme one – it’s definitely not best practice!
# Create our figure and our axes
figure, axes = plt.subplots(figsize=(6, 5))
# Plot the Data
axes.bar(number_countries.index, # The X axis is the index values
height = number_countries["num_countries"],
tick_label = number_countries["continent"],
width = 0.2); # Set the tick labels to be the continent names, not just numbers.
# Add Labels, Title and Captionsplt.suptitle("Number of Countries In Each Continent", x = 0.25)
plt.title("Data from Gapminder Dataset" ,
fontname="Arial", size=12, x = 0.02)
figure.text(x=0.65, y=-0.02, s="Source: Gapminder", ha="left")
axes.set_ylabel("Continent")
axes.set_xlabel("Count")
# Set Gridlines and colours
axes.grid(b = True , which = "both", axis = "y", color = (0.745, 0.745, 0.745))
axes.set_frame_on(False)
axes.set_axisbelow(True)
# Set Tick Colours to the same grey as our gridlines
axes.xaxis.set_tick_params(color=(0.745,0.745,0.745))
axes.yaxis.set_tick_params(color=(0.745,0.745,0.745))
plt.show();Again this is also possible in Seaborn, but can require a function to modify the patches attribute.
Help for this is available in several Stack Overflow answers
3.1.5 Colouring Bars
Again, we won’t go through all of the colour options here; as these are covered in chapter 2.
We should use the same colour and shade for categorical data that cannot be organised into broad groups.
# Create our figure and our axes
figure, axes = plt.subplots(figsize=(6, 5))
# Plot the Data
axes.bar(number_countries.index, # The X axis is the index values
height = number_countries["num_countries"],
tick_label = number_countries["continent"], # Set the tick labels to be the continent names, not just numbers.
color = "CadetBlue"); # NEW - Set colour
# Add Labels, Title and Captionsplt.suptitle("Number of Countries In Each Continent", x = 0.25)
plt.title("Data from Gapminder Dataset" ,
fontname="Arial", size=12, x = 0.02)
figure.text(x=0.65, y=-0.02, s="Source: Gapminder", ha="left")
axes.set_ylabel("Continent")
axes.set_xlabel("Count")
# Set Gridlines and colours
axes.grid(b = True , which = "both", axis = "y", color = (0.745, 0.745, 0.745))
axes.set_frame_on(False)
axes.set_axisbelow(True)
# Set Tick Colours to the same grey as our gridlines
axes.xaxis.set_tick_params(color=(0.745,0.745,0.745))
axes.yaxis.set_tick_params(color=(0.745,0.745,0.745))
plt.show();We’ll discuss multiple colours when we look at bar charts with a continuous y and discrete x later. Our continents are a broad group, so according to guidance it’s not appropriate to colour them individually.
3.1.5.1 Seaborn
Seaborn uses the color parameter.
bar_plot = sns.barplot(x="continent", y = "life_exp",
data=gapminder_1992,
ci=None, # Removes confidence interval bars
estimator=np.count_nonzero, # Uses the count - mean is default
color="blue");
# Removes the "spines" or edges of our vissns.despine(left=True, bottom=True)
# Add Labels, Title and Captions (this comes from Matplotlib!)
plt.suptitle("Number of Countries In Each Continent", x = 0.30, fontname="Arial", size=16,)
plt.title("Data from Gapminder Dataset" ,
fontname="Arial", size=12, x = 0.047)
bar_plot.text(x=3, y= -10.5, s="Source: Gapminder", ha="left")
bar_plot.set_ylabel("Continent")
bar_plot.set_xlabel("Count")
# Set Gridlines and colours
bar_plot.grid(b = True , which = "both", axis = "y", color = (0.745, 0.745, 0.745))
# Set Tick Colours to the same grey as our gridlinesbar_plot.xaxis.set_tick_params(color=(0.745,0.745,0.745))
bar_plot.yaxis.set_tick_params(color=(0.745,0.745,0.745))
plt.show();3.1.6 Value Labels
We can also create value labels for bars. If you find yourself regularly labelling individual bar values, a table may be a better presentation method. In alignment with the GSS guidelines bar values should be aligned at the bottom of the bars to enable easy comparison.
# Create our figure and our axes
figure, axes = plt.subplots(figsize=(6, 5))
# Plot the Data
axes.barh(number_countries.index, # The Y axis is the index values
width = number_countries["num_countries"],
tick_label = number_countries["continent"]); # Set the tick labels to be the continent names, not just numbers.
# Add Labels, Title and Captionsplt.suptitle("Number of Countries In Each Continent", x = 0.25)
plt.title("Data from Gapminder Dataset" ,
fontname="Arial", size=12, x = 0.02)
figure.text(x=0.65, y=-0.02, s="Source: Gapminder", ha="left")
axes.set_ylabel("Continent")
axes.set_xlabel("Count")
# Set Gridlines and colours
axes.grid(b = True , which = "both", axis = "x", color = (0.745, 0.745, 0.745))
axes.set_frame_on(False)
axes.set_axisbelow(True)
# Set Tick Colours to the same grey as our gridlines
axes.xaxis.set_tick_params(color=(0.745,0.745,0.745))
axes.yaxis.set_tick_params(color=(0.745,0.745,0.745))
# Create value labels programaticaly
# Create a list of the total values to loop over
country_count = number_countries["num_countries"].tolist()
# Loop over the values
for i in range(len(country_count)):
axes.annotate(country_count[i], # Text is the value of the loop we're currently on
xy = (0, i), # At position 0 on the x axis, and the y position of the current loop
color = "white",
va = "center") # Set the colour of the text to white
plt.show();4 Continuous X, Continuous Y
4.1 Scatter Plots Revisited
We covered scatter plots in great detail in chapter 2, however we will revisit some elements here.
As a basic reminder we plot scatter plots like this in:
Matplotlib
# Make our 1987 data again
gm_1987 = gapminder[gapminder["year"] == 1987]
gm_1987.head() country continent year ... infant_mortality fertility above_mean_gdp
7 Afghanistan Asia 1987 ... NaN NaN 0
19 Albania Europe 1987 ... 40.8 3.13 0
31 Algeria Africa 1987 ... 46.3 5.51 0
43 Angola Africa 1987 ... 134.1 7.20 0
55 Argentina Americas 1987 ... 27.1 3.06 1
[5 rows x 9 columns]
# Create our figure and our axes
figure, axes = plt.subplots(figsize=(6, 5))
# Plot the Data
axes.scatter(x=gm_1987["gdp_per_cap"], y=gm_1987["life_exp"])
# Add Labels, Title and Captions
plt.suptitle("Graph showing Life Expectancy by GDP per Capita",
fontname="Arial", size=16)
plt.title("Data from Gapminder Dataset" ,
fontname="Arial", size=12, loc="left")
figure.text(x=0.65, y=-0.01, s="Source: Gapminder", ha="left",
fontname="Arial", size=12)
axes.set_xlabel("Gross Domestic Product per Capita in International Dollars", fontname="Arial", size=12)
axes.set_ylabel("Life expectancy at birth in years", fontname="Arial", size=12)
# Set Tick Colours to the same grey as our gridlines
axes.xaxis.set_tick_params(color=(0.745,0.745,0.745))
axes.yaxis.set_tick_params(color=(0.745,0.745,0.745));
# Set Gridlines and colours
# Set Gridlines and colours
axes.grid(b = True , which = "both", axis = "x", color = "white") # Without this both grid lines are visible (bug?)axes.grid(b = True , which = "both", axis = "y", color = (0.745, 0.745, 0.745))
axes.set_frame_on(False)
axes.set_axisbelow(True)
# Set the Y axis lower limit to 0
axes.set_ylim(bottom=0, top = 85)plt.show();Pandas:
# Create our figure and our axes
figure, axes = plt.subplots(figsize=(6, 5))
# Plot the Data
gm_1987.plot(x = "gdp_per_cap", y = "life_exp",
kind = "scatter",
ax = axes) # Plot on the axes
# Add Labels, Title and Captions
plt.suptitle("Graph showing Life Expectancy by GDP per Capita",
fontname="Arial", size=16)
plt.title("Data from Gapminder Dataset" ,
fontname="Arial", size=12, loc="left")
figure.text(x=0.65, y=-0.01, s="Source: Gapminder", ha="left",
fontname="Arial", size=12)
axes.set_xlabel("Gross Domestic Product per Capita in International Dollars", fontname="Arial", size=12)
axes.set_ylabel("Life expectancy at birth in years", fontname="Arial", size=12)
# Set Tick Colours to the same grey as our gridlines
axes.xaxis.set_tick_params(color=(0.745,0.745,0.745))
axes.yaxis.set_tick_params(color=(0.745,0.745,0.745))
# Set Gridlines and colours
axes.grid(b = True , which = "both", axis = "x", color = "white") # Without this both grid lines are visible (bug?)
axes.grid(b = True , which = "both", axis = "y", color = (0.745, 0.745, 0.745))
axes.set_frame_on(False)
axes.set_axisbelow(True)
# Set the Y axis lower limit to 0
axes.set_ylim(bottom=0, top = 85)plt.show();4.1.1 Choosing Your Axis
We covered this in chapter 2 – but just to reiterate there are several ways to set the values of the x and y axes. These include: * axes.set(ylim=(0), xlim= (0)) * axes.set_ylim(bottom=0, top = 10000) and axes.set_xlim(left=0) * plt.xlim() and plt.ylim()
If we don’t pass a parameter like top matplotlib will automatically decide for us. We discussed the importance of setting the y axis to start at 0, here I’ve used axes.set_ylim(bottom=0, top = 85) which gives more white space above the top of the data, this can make the points at the top more readable.
# Create our figure and our axes
figure, axes = plt.subplots(figsize=(6, 5))
# Plot the Data
axes.scatter(x=gm_1987["gdp_per_cap"], y=gm_1987["life_exp"])
# Add Labels, Title and Captions
plt.suptitle("Graph showing Life Expectancy by GDP per Capita",
fontname="Arial", size=16)
plt.title("Data from Gapminder Dataset" ,
fontname="Arial", size=12, loc="left")
figure.text(x=0.65, y=-0.01, s="Source: Gapminder", ha="left",
fontname="Arial", size=12)
axes.set_xlabel("Gross Domestic Product per Capita in International Dollars", fontname="Arial", size=12)
axes.set_ylabel("Life expectancy at birth in years", fontname="Arial", size=12)
# Set Tick Colours to the same grey as our gridlines
axes.xaxis.set_tick_params(color=(0.745,0.745,0.745))
axes.yaxis.set_tick_params(color=(0.745,0.745,0.745))
# Set Gridlines and colours
axes.grid(b = True , which = "both", axis = "x", color = "white") # Without this both grid lines are visible (bug?)
axes.grid(b = True , which = "both", axis = "y", color = (0.745, 0.745, 0.745))
axes.set_frame_on(False)
axes.set_axisbelow(True)
# Set the Y axis lower limit to 0
axes.set_ylim(bottom=0, top = 85)plt.show();And Seaborn
plot = sns.lmplot(x = "gdp_per_cap", y = "life_exp", data = gm_1987,
fit_reg = False )
# Removes the "spines" or edges of our vis
sns.despine(left=True, bottom=True)
# Add Labels, Title and Captions (this comes from Matplotlib!)
plt.suptitle("Graph showing Life Expectancy by GDP per Capita",
fontname="Arial", size=16, y = 1.05, x = 0.65)
plt.title("Data from Gapminder Dataset" ,
fontname="Arial", size=12, loc="left", y = 1.01)
plt.text(x= 20000.0, y = -12, s="Source: Gapminder", ha="left",
fontname="Arial", size=12)
# Get axes and change grid lines and colours
axes = plot.ax
axes.grid(b = True , which = "both", axis = "x", color = "white") # Without this both grid lines are visible (bug?)axes.grid(b = True , which = "both", axis = "y", color = (0.745, 0.745, 0.745))
# Note the methods are set_xlabels -with an s!
plot.set_xlabels("Gross Domestic Product per Capita in International Dollars", fontname="Arial", size=12)plot.set_ylabels("Life expectancy at birth in years", fontname="Arial", size=12)plt.ylim(0, None) # Set our Y axis to start from 0 or it "floats")plt.show();As of version 0.9.0 Seaborn has a function called sns.scatterplot() at current time of writing (November 2020). These versions might not be available within your government department, as they are quite new. If you can upgrade, you may need to update other packages (e.g numpy and scipy). We recommend working in a virtual environment to do so; especially if you require certain versions of packages for your role. If you want to experiment with non sensitive data, then options like Google Colab will let you experiment with packages without installing to your machine.
For this plot the code will be given as a markdown cells. This is because they won’t run for everyone if they are in a code cell, there will also be an image of the output.
# Plot
plot = sns.scatterplot(x = "gdp_per_cap", y = "life_exp", data = gm_1987)
# Removes the "spines" or edges of our vis
sns.despine(left=True, bottom=True)
plot.grid(b = True , which = "both", axis = "x", color = "white") # Without this both grid lines are visible (bug?)
plot.grid(b = True , which = "both", axis = "y", color = (0.745, 0.745, 0.745))
# Add Labels, Title and Captions (this comes from Matplotlib!)
plt.suptitle("Graph showing Life Expectancy by GDP per Capita",
fontname="Arial", size=16, y = 1.0, x = 0.5)
plt.title("Data from Gapminder Dataset" ,
fontname="Arial", size=12, loc="left", y = 1.01, x = -0.06)
plt.text(x= 20000.0, y = -20, s="Source: Gapminder", ha="left",
fontname="Arial", size=12)
# Note the methods are set_xlabel
plot.set_xlabel("Gross Domestic Product per Capita in International Dollars", fontname="Arial", size=12)
plot.set_ylabel("Life expectancy at birth in years", fontname="Arial", size=12)
plt.ylim(0, None); # Set our Y axis to start from 0 or it "floats")4.1.2 Changing Markers
4.1.2.1 Marker type
All three types of plot use the markers from matplotlib. You can find a table of markers and definitions on the matplotlib website.
Note here the paramater in matplotlib and using .plot() is marker
The paramater in Seaborn is markers
4.1.2.2 Marker size.
This can be achived in matplotlib and .plot() by adding the parameter
s
These are measured in points; like fonts. Each point is equivalent to 1/72 of an inch. While it’s possible some people measure out their ideal point size and apply it; most visualisations generally involve experimenting until something looks “about right”.
The s parameter may work in newer version of Seaborn; however the following parameter will if it does not.
scatter_kws={"s": 72}Sufficient google searching couldn’t find the specific scale of measurement for these markers; but again advice seemed to follow the “stop when it looks about right” family.
4.1.2.3 Marker colours
We covered colours in depth in chapter 2.
In Matplotlib and .plot() colours can be altered using the color parameter and a variety of methods including:
color = "blue" # Using named or web safe colours)
color = (0, 0.239, 0.349)
# Using RGB colours, each expressed as a decimal from 1 to 0
color = "#A0E7E5" # Using hexidacimal codesMatplotlib:
# Create our figure and our axes
figure, axes = plt.subplots(figsize=(6, 5))
# Plot the Data
axes.scatter(x=gm_1987["gdp_per_cap"], y=gm_1987["life_exp"],
marker = "x", # Set marker style
s = 72, # Set Size
color = (0, 0.239, 0.349)) # Set colour
# Add Labels, Title and Captions
plt.suptitle("Graph showing Life Expectancy by GDP per Capita",
fontname="Arial", size=16)
plt.title("Data from Gapminder Dataset" ,
fontname="Arial", size=12, loc="left")
figure.text(x=0.65, y=-0.01, s="Source: Gapminder", ha="left",
fontname="Arial", size=12)
axes.set_xlabel("Gross Domestic Product per Capita in International Dollars", fontname="Arial", size=12)
axes.set_ylabel("Life expectancy at birth in years", fontname="Arial", size=12)
# Set Tick Colours to the same grey as our gridlines
axes.xaxis.set_tick_params(color=(0.745,0.745,0.745))
axes.yaxis.set_tick_params(color=(0.745,0.745,0.745));
# Set Gridlines and colours
axes.grid(b = True , which = "both", axis = "x", color = "white") # Without this both grid lines are visible (bug?)
axes.grid(b = True , which = "both", axis = "y", color = (0.745, 0.745, 0.745))
axes.set_frame_on(False)
axes.set_axisbelow(True)
# Set the Y axis lower limit to 0
axes.set_ylim(bottom=0, top = 85)plt.show();Pandas:
# Create our figure and our axes
figure, axes = plt.subplots(figsize=(6, 5))
# Plot the Datagm_1987.plot(x = "gdp_per_cap", y = "life_exp",
kind = "scatter",
marker = "D", # Set marker style
s = 40, # Set Marker Size
color = "cyan", # Set color
ax = axes) # Plot on the axes
# Add Labels, Title and Captions
plt.suptitle("Graph showing Life Expectancy by GDP per Capita",
fontname="Arial", size=16)
plt.title("Data from Gapminder Dataset" ,
fontname="Arial", size=12, loc="left")
figure.text(x=0.65, y=-0.01, s="Source: Gapminder", ha="left",
fontname="Arial", size=12)
axes.set_xlabel("Gross Domestic Product per Capita in International Dollars", fontname="Arial", size=12)
axes.set_ylabel("Life expectancy at birth in years", fontname="Arial", size=12)
# Set Tick Colours to the same grey as our gridlines
axes.xaxis.set_tick_params(color=(0.745,0.745,0.745))
axes.yaxis.set_tick_params(color=(0.745,0.745,0.745));
# Set Gridlines and colours
axes.grid(b = True , which = "both", axis = "x", color = "white") # Without this both grid lines are visible (bug?)
axes.grid(b = True , which = "both", axis = "y", color = (0.745, 0.745, 0.745))
axes.set_frame_on(False)
axes.set_axisbelow(True)
# Set the Y axis lower limit to 0
axes.set_ylim(bottom=0, top = 85)plt.show();For Seaborn we can include colours in the scatter_kws={} dictionary.
This accepts most of the methods shown above, but struggles with RGB values.
The transparancy (or alpha) in Seaborn seems to be set around 80% so don’t worry if your colours don’t come out quite as you’d expect them. We’ll look at how to sort the alpha in the next section.
scatter_kws={“color”: “#4C2C2E”}
Note that we can put both the size and the color arguments in the same dictionary e.g
scatter_kws={“s”: 200, “color”: “#4C2C2E”}
plot = sns.lmplot(x = "gdp_per_cap", y = "life_exp", data = gm_1987,
fit_reg = False,
markers = "*",
palette= "red",
scatter_kws = {"s": 200,
"color": "#4C2C2E"})
# Removes the "spines" or edges of our vis
sns.despine(left=True, bottom=True)
# Add Labels, Title and Captions (this comes from Matplotlib!)
plt.suptitle("Graph showing Life Expectancy by GDP per Capita",
fontname="Arial", size=16, y = 1.05, x = 0.65)
plt.title("Data from Gapminder Dataset" ,
fontname="Arial", size=12, loc="left", y = 1.01)
plt.text(x= 20000.0, y = -13, s="Source: Gapminder", ha="left",
fontname="Arial", size=12)
# Note the methods are set_xlabels -with an s!
plot.set_xlabels("Gross Domestic Product per Capita in International Dollars", fontname="Arial", size=12)plot.set_ylabels("Life expectancy at birth in years", fontname="Arial", size=12)
# Get axes and change grid lines and coloursaxes = plot.ax
axes.grid(b = True , which = "both", axis = "x", color = "white") # Without this both grid lines are visible (bug?)axes.grid(b = True , which = "both", axis = "y", color = (0.745, 0.745, 0.745))
plt.ylim(0, None) # Set our Y axis to start from 0 or it "floats")plt.show();4.1.3 Using Alpha With Large Amounts Of Data
We also talked about alpha in chapter 2. This can be really useful with large amounts of data.
For our Matplotlib and .plot() charts this is as simple as setting the alpha = to a decimal between 0 (transparent) and 1 (solid)
# Create our figure and our axes
figure, axes = plt.subplots(figsize=(6, 5))
# Plot the Data
axes.scatter(x=gm_1987["gdp_per_cap"], y=gm_1987["life_exp"],
alpha = 0.5) # Set colour
# Add Labels, Title and Captions
plt.suptitle("Graph showing Life Expectancy by GDP per Capita",
fontname="Arial", size=16)
plt.title("Data from Gapminder Dataset" ,
fontname="Arial", size=12, loc="left")
figure.text(x=0.65, y=-0.01, s="Source: Gapminder", ha="left",
fontname="Arial", size=12)
axes.set_xlabel("Gross Domestic Product per Capita in International Dollars", fontname="Arial", size=12)
axes.set_ylabel("Life expectancy at birth in years", fontname="Arial", size=12)
# Set Tick Colours to the same grey as our gridlines
axes.xaxis.set_tick_params(color=(0.745,0.745,0.745))
axes.yaxis.set_tick_params(color=(0.745,0.745,0.745))
# Set Gridlines and colours
axes.grid(b = True , which = "both", axis = "x", color = "white") # Without this both grid lines are visible (bug?)
axes.grid(b = True , which = "both", axis = "y", color = (0.745, 0.745, 0.745))
axes.set_frame_on(False)
axes.set_axisbelow(True)
# Set the Y axis lower limit to 0
axes.set_ylim(bottom=0, top = 85)plt.show();Pandas:
# Create our figure and our axes
figure, axes = plt.subplots(figsize=(6, 5))
# Plot the Data
gm_1987.plot(x = "gdp_per_cap", y = "life_exp",
kind = "scatter",
alpha = 0.5,
ax = axes) # Plot on the axes
# Add Labels, Title and Captions
plt.suptitle("Graph showing Life Expectancy by GDP per Capita",
fontname="Arial", size=16)
plt.title("Data from Gapminder Dataset" ,
fontname="Arial", size=12, loc="left")
figure.text(x=0.65, y=-0.01, s="Source: Gapminder", ha="left",
fontname="Arial", size=12)
axes.set_xlabel("Gross Domestic Product per Capita in International Dollars", fontname="Arial", size=12)
axes.set_ylabel("Life expectancy at birth in years", fontname="Arial", size=12)
# Set Tick Colours to the same grey as our gridlines
axes.xaxis.set_tick_params(color=(0.745,0.745,0.745))
axes.yaxis.set_tick_params(color=(0.745,0.745,0.745));
# Set Gridlines and colours
axes.grid(b = True , which = "both", axis = "x", color = "white") # Without this both grid lines are visible (bug?)
axes.grid(b = True , which = "both", axis = "y", color = (0.745, 0.745, 0.745))
axes.set_frame_on(False)
axes.set_axisbelow(True)
# Set the Y axis lower limit to 0
axes.set_ylim(bottom=0, top = 85)plt.show();For Seaborn this is set in our scatter_kws{} dictionary using alpha as the key.
Frustratingly in the sns.lmplot() alpha seems to be automatically set to around 80%, bear this in mind when choosing colours!
plot = sns.lmplot(x = "gdp_per_cap", y = "life_exp", data = gm_1987,
fit_reg = False,
markers = "*",
palette = "red",
scatter_kws = {"alpha" : 1,
"color": "#4C2C2E"})
# Removes the "spines" or edges of our vis
sns.despine(left=True, bottom=True)
# Add Labels, Title and Captions (this comes from Matplotlib!)
plt.suptitle("Graph showing Life Expectancy by GDP per Capita",
fontname="Arial", size=16, y = 1.05, x = 0.65)
plt.title("Data from Gapminder Dataset" ,
fontname="Arial", size=12, loc="left", y = 1.01)
plt.text(x= 20000.0, y = -12 , s="Source: Gapminder", ha="left",
fontname="Arial", size=12)
# Note the methods are set_xlabels -with an s!
plot.set_xlabels("Gross Domestic Product per Capita in International Dollars", fontname="Arial", size=12)plot.set_ylabels("Life expectancy at birth in years", fontname="Arial", size=12)
# Get axes and change grid lines and coloursaxes = plot.ax
axes.grid(b = True , which = "both", axis = "x", color = "white") # Without this both grid lines are visible (bug?)axes.grid(b = True , which = "both", axis = "y", color = (0.745, 0.745, 0.745))
plt.ylim(0, None) # Set our Y axis to start from 0 or it "floats")plt.show();5 Continuous Functions
5.1 Line Plots
In Matplotlib the function axes.plot() produces a line plot.
The data we pass goes in the order x and then y – however we just pass them in order – we don’t pass parameters here like we have done previously.
# Set up some new data to plot
uk_gapminder = gapminder[gapminder["country"] == "United Kingdom"]# Create our figure and our axes
figure, axes = plt.subplots(figsize=(6, 5))
# Plot the Data
axes.plot(uk_gapminder["year"], uk_gapminder["life_exp"])
# Add Labels, Title and Captions
plt.suptitle("Graph showing Life Expectancy by Year for UK",
fontname="Arial", size=16, x = 0.43)
plt.title("Data from Gapminder Dataset" ,
fontname="Arial", size=12, loc="left")
figure.text(x=0.65, y=-0.01, s="Source: Gapminder", ha="left",
fontname="Arial", size=12)
axes.set_xlabel("Year", fontname="Arial", size=12)
axes.set_ylabel("Life expectancy at birth in years", fontname="Arial", size=12)
# Set Tick Colours to the same grey as our gridlines
axes.xaxis.set_tick_params(color=(0.745,0.745,0.745))
axes.yaxis.set_tick_params(color=(0.745,0.745,0.745));
# Set Gridlines and colours
axes.grid(b = True , which = "both", axis = "x", color = "white") # Without this both grid lines are visible (bug?)
axes.grid(b = True , which = "both", axis = "y", color = (0.745, 0.745, 0.745))
axes.set_frame_on(False)
axes.set_axisbelow(True)
# Set the Y axis lower limit to 0
axes.set_ylim(bottom=0, top = 85)plt.show();If we also do a scatter plot of the same data we can notice how there’s some smoothing of the line happening. My scatter plot dots are not perfectly sat in the middle of the line.
Note that I’ve not set the y axis to 0 here – just to make it easier to see that the line smoothed.
# Create our figure and our axes
figure, axes = plt.subplots(figsize=(6, 5))
# Plot the Data
axes.plot(uk_gapminder["year"], uk_gapminder["life_exp"])
# Plot the Data
axes.scatter(x = uk_gapminder["year"], y = uk_gapminder["life_exp"],
color = "red")
# Add Labels, Title and Captions
plt.suptitle("Graph showing Life Expectancy by Year for UK",
fontname="Arial", size=16, x = 0.43)
plt.title("Data from Gapminder Dataset" ,
fontname="Arial", size=12, loc="left")
figure.text(x=0.65, y=-0.01, s="Source: Gapminder", ha="left",
fontname="Arial", size=12)
axes.set_xlabel("Year", fontname="Arial", size=12)
axes.set_ylabel("Life expectancy at birth in years", fontname="Arial", size=12)
# Set Tick Colours to the same grey as our gridlines
axes.xaxis.set_tick_params(color=(0.745,0.745,0.745))
axes.yaxis.set_tick_params(color=(0.745,0.745,0.745));
# Set Gridlines and colours
axes.grid(b = True , which = "both", axis = "x", color = "white") # Without this both grid lines are visible (bug?)
axes.grid(b = True , which = "both", axis = "y", color = (0.745, 0.745, 0.745))
axes.set_frame_on(False)
axes.set_axisbelow(True)
plt.show();Panda’s .plot() by default creates a line graph.
Here I’ve specified kind = “line” to make it clear to the user what the output will be if they’re just reading the code.
# Create our figure and our axes
figure, axes = plt.subplots(figsize=(6, 5))
# Plot the Data
uk_gapminder.plot(x = "year", y = "life_exp",
kind = "line",
ax = axes) # Plot on the axes
# Add Labels, Title and Captions
plt.suptitle("Graph showing Life Expectancy by Year for UK",
fontname="Arial", size=16, x = 0.43)
plt.title("Data from Gapminder Dataset" ,
fontname="Arial", size=12, loc="left")
figure.text(x=0.65, y=-0.01, s="Source: Gapminder", ha="left",
fontname="Arial", size=12)
axes.set_xlabel("Year", fontname="Arial", size=12)
axes.set_ylabel("Life expectancy at birth in years", fontname="Arial", size=12)
# Set Tick Colours to the same grey as our gridlines
axes.xaxis.set_tick_params(color=(0.745,0.745,0.745))
axes.yaxis.set_tick_params(color=(0.745,0.745,0.745));
# Set Gridlines and colours
axes.grid(b = True , which = "both", axis = "x", color = "white") # Without this both grid lines are visible (bug?)axes.grid(b = True , which = "both", axis = "y", color = (0.745, 0.745, 0.745))
axes.set_frame_on(False)
axes.set_axisbelow(True)
# Set the Y axis lower limit to 0
axes.set_ylim(bottom=0, top = 85)axes.legend(loc = "lower right")
plt.show();Interestingly Seaborn only introduced lineplots from version 0.9.
As mentioned in Scatterplots above for these plots the code will be given as markdown cells. This is because they won’t run for everyone if they are in a code cell, there will also be an image of the output.
5.1.1 Seaborn Relplot
plot = sns.relplot(x = "year", y = "life_exp", data= uk_gapminder,
kind = "line")
# Removes the "spines" or edges of our vis
sns.despine(left=True, bottom=True)
# Add Labels, Title and Captions
plt.suptitle("Graph showing Life Expectancy by Year for UK",
fontname="Arial", size=16, y = 1.06, x = 0.35)
plt.title("Data from Gapminder Dataset" ,
fontname="Arial", size=12, loc="left", y = 1.01, x = -0.13)
plt.text(y = -12, x = 1989, s = "Source: Gapminder", ha="left",
fontname="Arial", size=12)
#Set the X and Y axis Labels
# Note the methods are set_xlabels -with a s!
plot.set_xlabels("Year", fontname="Arial", size=12)
plot.set_ylabels("Life expectancy at birth in years", fontname="Arial", size=12)
# Get axes and plot grids
axes = plot.ax
axes.grid(b = True , which = "both", axis = "x", color = "white") # Without this both grid lines are visible (bug?)
axes.grid(b = True , which = "both", axis = "y", color = (0.745, 0.745, 0.745))
# Set the Y axis lower limit to 0 and upper to 90 to give space.
plt.ylim(0, 90); # Set our Y axis to start from 0 or it "floats")5.1.2 Seaborn Lineplot
# Lineplot will plot onto a figure and axes object - giving more fine controll.
figure, axes = plt.subplots(figsize=(6, 5))
# Create the plot
plot = sns.lineplot(x = "year", y = "life_exp", data = uk_gapminder)
# Removes the "spines" or edges of our vis
sns.despine(left=True, bottom=True)
# Add Labels, Title and Captions
plt.suptitle("Graph showing Life Expectancy by Year for UK",
fontname="Arial", size=16, y = 1.0, x = 0.35)
plt.title("Data from Gapminder Dataset" ,
fontname="Arial", size=12, loc="left", y = 1.04, x = -0.13)
plt.text(y = -12, x = 1989, s = "Source: Gapminder", ha="left",
fontname="Arial", size=12)
# Note the methods are set_xlabel here -
axes.set_xlabel("Year", fontname="Arial", size=12)
axes.set_ylabel("Life expectancy at birth in years", fontname="Arial", size=12)
# Get axes and plot grids
axes.grid(b = True , which = "both", axis = "x", color = "white") # Without this both grid lines are visible (bug?)
axes.grid(b = True , which = "both", axis = "y", color = (0.745, 0.745, 0.745))
# Set the Y axis lower limit to 0 and upper to 90 to give space.
plt.ylim(0, 90); # Set our Y axis to start from 0 or it "floats")5.1.3 Plotting Through The Origin
As we’ve previously covered, good practice is to plot through the origin. As mentioned above there are several ways to do this.
Note the visualisation on the left; where the y axis has not been set to 0 appears to be a much steeper curve than the one on the right where the axis has been adjusted. This could be considered misleading.
# Create our figure and our axes
figure, (axes1, axes2) = plt.subplots(1, 2, figsize=(10, 4))
### Axes 1 ###
# Plot the Data for ax 1
axes1.plot(uk_gapminder["year"], uk_gapminder["life_exp"])
# Set x and y axis labels
axes1.set_xlabel("Year", fontname="Arial", size=12)
axes1.set_ylabel("Life expectancy at birth in years", fontname="Arial", size=12)
axes1.set_title("Automatically imputed Y axis", fontname="Arial", size=12)
# Set Tick Colours to the same grey as our gridlines
axes.xaxis.set_tick_params(color=(0.745,0.745,0.745))
axes.yaxis.set_tick_params(color=(0.745,0.745,0.745));
# Set Gridlines and colours
axes1.grid(b = True , which = "both", axis = "x", color = "white") # Without this both grid lines are visible (bug?)
axes1.grid(b = True , which = "both", axis = "y", color = (0.745, 0.745, 0.745))
axes1.set_frame_on(False)
axes1.set_axisbelow(True)
### Axes 2 ###
# Plot the Data for ax 2
axes2.plot(uk_gapminder["year"], uk_gapminder["life_exp"])
# Set x and y axis labels
axes2.set_xlabel("Year", fontname="Arial", size=12)
axes2.set_ylabel("Life expectancy at birth in years", fontname="Arial", size=12)
axes2.set_title("With Y axis set to 0", fontname="Arial", size=12)
# Set Tick Colours to the same grey as our gridlines
axes.xaxis.set_tick_params(color=(0.745,0.745,0.745))
axes.yaxis.set_tick_params(color=(0.745,0.745,0.745));
# Set Gridlines and colours
axes2.grid(b = True , which = "both", axis = "x", color = "white") # Without this both grid lines are visible (bug?)
axes2.grid(b = True , which = "both", axis = "y", color = (0.745, 0.745, 0.745))
axes2.set_frame_on(False)
axes2.set_axisbelow(True)
# Set the Y axis lower limit to 0
axes2.set_ylim(bottom=0, top = 85)
# Add Labels, Title and Captionsplt.suptitle("Graph showing Life Expectancy by Year for UK",
fontname="Arial", size=16, x = 0.3, y = 1.05)
plt.title("Data from Gapminder Dataset" ,
fontname="Arial", size=12, loc="left", x = -1.25, y = 1.09)
figure.text(x=0.75, y=-0.01, s="Source: Gapminder", ha="left",
fontname="Arial", size=12)
plt.show();5.1.4 Line Styles
Matplotlib Linestyles include
[‘solid’, ‘dashed’, ‘dashdot’, ‘dotted’ | '-', '--', '-.', ':', 'None' ]More details can be found in the matplotlib documentation pages.
Some of these are duplicates – “dotted” and “:” produce an identical result. Note again that “dotted” makes more sense to humans reading code than “:” does.
The same arguments also work for our .plot() visualisation.
# Create our figure and our axes
figure, axes = plt.subplots(figsize=(6, 5))
# Plot the Data
axes.plot(uk_gapminder["year"], uk_gapminder["life_exp"],
linestyle = "dotted") # NEW set a linestyle
# Add Labels, Title and Captions
plt.suptitle("Graph showing Life Expectancy by Year for UK",
fontname="Arial", size=16, x = 0.43)
plt.title("Data from Gapminder Dataset" ,
fontname="Arial", size=12, loc="left")
figure.text(x=0.65, y=-0.01, s="Source: Gapminder", ha="left",
fontname="Arial", size=12)
axes.set_xlabel("Year", fontname="Arial", size=12)
axes.set_ylabel("Life expectancy at birth in years", fontname="Arial", size=12)
# Set Tick Colours to the same grey as our gridlines
axes.xaxis.set_tick_params(color=(0.745,0.745,0.745))
axes.yaxis.set_tick_params(color=(0.745,0.745,0.745));
# Set Gridlines and colours
axes.grid(b = True , which = "both", axis = "x", color = "white") # Without this both grid lines are visible (bug?)
axes.grid(b = True , which = "both", axis = "y", color = (0.745, 0.745, 0.745))
axes.set_frame_on(False)
axes.set_axisbelow(True)
# Set the Y axis lower limit to 0
axes.set_ylim(bottom=0, top = 85)plt.show();The same arguments also work for our .plot() visualisation.
# Create our figure and our axes
figure, axes = plt.subplots(figsize=(6, 5))
# Plot the Data
uk_gapminder.plot(x = "year", y = "life_exp",
kind = "line",
ax = axes, # Plot on the axes
linestyle = "dashed" ) # NEW - Set a linestyle.
# Add Labels, Title and Captions
plt.suptitle("Graph showing Life Expectancy by Year for UK",
fontname="Arial", size=16, x = 0.43)
plt.title("Data from Gapminder Dataset" ,
fontname="Arial", size=12, loc="left")
figure.text(x=0.65, y=-0.01, s="Source: Gapminder", ha="left",
fontname="Arial", size=12)
axes.set_xlabel("Year", fontname="Arial", size=12)
axes.set_ylabel("Life expectancy at birth in years", fontname="Arial", size=12)
# Set Tick Colours to the same grey as our gridlines
axes.xaxis.set_tick_params(color=(0.745,0.745,0.745))
axes.yaxis.set_tick_params(color=(0.745,0.745,0.745));
# Set Gridlines and colours
axes.grid(b = True , which = "both", axis = "x", color = "white") # Without this both grid lines are visible (bug?)axes.grid(b = True , which = "both", axis = "y", color = (0.745, 0.745, 0.745))
axes.set_frame_on(False)
axes.set_axisbelow(True)
# Set the Y axis lower limit to 0
axes.set_ylim(bottom=0, top = 85)axes.legend(loc = "lower right")
plt.show();5.1.5 Line Colours
We can set colours the same ways as we’ve seen before. Here we’ll just show a few visualisations for demonstration purposes.
# Create our figure and our axes
figure, axes = plt.subplots(figsize=(6, 5))
# Plot the Data
axes.plot(uk_gapminder["year"], uk_gapminder["life_exp"],
color = "#235956") # NEW set a colour
# Add Labels, Title and Captions
plt.suptitle("Graph showing Life Expectancy by Year for UK",
fontname="Arial", size=16, x = 0.43)
plt.title("Data from Gapminder Dataset" ,
fontname="Arial", size=12, loc="left")
figure.text(x=0.65, y=-0.01, s="Source: Gapminder", ha="left",
fontname="Arial", size=12)
axes.set_xlabel("Year", fontname="Arial", size=12)
axes.set_ylabel("Life expectancy at birth in years", fontname="Arial", size=12)
# Set Tick Colours to the same grey as our gridlines
axes.xaxis.set_tick_params(color=(0.745,0.745,0.745))
axes.yaxis.set_tick_params(color=(0.745,0.745,0.745));
# Set Gridlines and colours
axes.grid(b = True , which = "both", axis = "x", color = "white") # Without this both grid lines are visible (bug?)
axes.grid(b = True , which = "both", axis = "y", color = (0.745, 0.745, 0.745))
axes.set_frame_on(False)
axes.set_axisbelow(True)
# Set the Y axis lower limit to 0
axes.set_ylim(bottom=0, top = 85)plt.show();The same arguments also work for our .plot() visualisation.
# Create our figure and our axes
figure, axes = plt.subplots(figsize=(6, 5))
# Plot the Data
uk_gapminder.plot(x = "year", y = "life_exp",
kind = "line",
ax = axes, # Plot on the axes
linestyle = "dashed", # NEW - Set a linestyle.
color = (0.076, 0.808, 0.84 ))
# Add Labels, Title and Captions
plt.suptitle("Graph showing Life Expectancy by Year for UK",
fontname="Arial", size=16, x = 0.43)
plt.title("Data from Gapminder Dataset" ,
fontname="Arial", size=12, loc="left")
figure.text(x=0.65, y=-0.01, s="Source: Gapminder", ha="left",
fontname="Arial", size=12)
axes.set_xlabel("Year", fontname="Arial", size=12)
axes.set_ylabel("Life expectancy at birth in years", fontname="Arial", size=12)
# Set Tick Colours to the same grey as our gridlines
axes.xaxis.set_tick_params(color=(0.745,0.745,0.745))
axes.yaxis.set_tick_params(color=(0.745,0.745,0.745));
# Set Gridlines and colours
axes.grid(b = True , which = "both", axis = "x", color = "white") # Without this both grid lines are visible (bug?)axes.grid(b = True , which = "both", axis = "y", color = (0.745, 0.745, 0.745))
axes.set_frame_on(False)
axes.set_axisbelow(True)
# Set the Y axis lower limit to 0
axes.set_ylim(bottom=0, top = 85)axes.legend(loc = "lower right")
plt.show();5.1.6 Plotting Multiple Lines
As we saw in chapter 2 with scatter plots we can plot multiple lines on the same axis.
5.1.6.1 Matplotlib
This can be something as simple as creating multiple ax.plot() objects.
You can also consider labeling the lines rather than, or in additon to, an index.
# Create our figure and our axes
figure, axes = plt.subplots(figsize=(6, 5))
# Plot the Data Line for UK
axes.plot(gapminder[gapminder["country"] == "United Kingdom"]["year"], # Our "x"
gapminder[gapminder["country"] == "United Kingdom"]["life_exp"], # Our "y"
color = "#003D59", # Set colour
label = "United Kingdom" ) # Set label
axes.annotate("United Kingdom",
xy = (gapminder[gapminder["country"] == "United Kingdom"]["year"].max() + 0.3,
gapminder[gapminder["country"] == "United Kingdom"]["life_exp"].max()),
color = "#003D59") # Set colour)
axes.plot(gapminder[gapminder["country"] == "Malawi"]["year"], # Our "x"
gapminder[gapminder["country"] == "Malawi"]["life_exp"], # Our "y"
color = "#A8BD3A", # Set a colour
label = "Malawi")
axes.annotate("Malawi",
xy = (gapminder[gapminder["country"]== "Malawi"]["year"].max() + 0.3, #x + little pad
gapminder[gapminder["country"]== "Malawi"]["life_exp"].max()), #y
color = "#A8BD3A") # Set colour)
# Add Labels, Title and Captions
plt.suptitle("Graph showing Life Expectancy by Year",
fontname="Arial", size=16, x = 0.43, y = 1.05)
plt.title("Data from Gapminder Dataset" ,
fontname="Arial", size=12, loc="left", y = 1.1, x = -0.025)
figure.text(x=0.65, y=-0.01, s="Source: Gapminder", ha="left",
fontname="Arial", size=12)
axes.set_xlabel("Year", fontname="Arial", size=12)
axes.set_ylabel("Life expectancy at birth in years", fontname="Arial", size=12)
# Set Tick Colours to the same grey as our gridlines
axes.xaxis.set_tick_params(color=(0.745,0.745,0.745))
axes.yaxis.set_tick_params(color=(0.745,0.745,0.745));
# Set Gridlines and colours
axes.grid(b = True , which = "both", axis = "x", color = "white") # Without this both grid lines are visible (bug?)
axes.grid(b = True , which = "both", axis = "y", color = (0.745, 0.745, 0.745))
axes.set_frame_on(False)
axes.set_axisbelow(True)
# Set the Y axis lower limit to 0
axes.set_ylim(bottom=0, top = 85)
# Customise and show the Legendaxes.legend(bbox_to_anchor=(0., 1.0, 1.0, .102), loc="lower left",
ncol=2,
mode="expand",
borderaxespad=0,
prop={"family": "Arial", "size":12})
plt.show();As we’ve seen before we can also use a loop to plot multiple values. Caution should be taken here to not overload a line plot with data and create a “spaghetti plot”.
Plotting a line for each of the values in country (142) or even for the countries in Europe (30) would be far too many. For these plots we’ll plot the mean life expectancy per year by continent. To do this group by continent and year, and find the mean life expectancy. .reset_index() resets the index to start from 0 again.
continent_year_life_exp = gapminder.groupby(["continent", "year"])["life_exp"].mean().reset_index()
continent_year_life_exp # View the data continent year life_exp
0 Africa 1952 39.135500
1 Africa 1957 41.266346
2 Africa 1962 43.319442
3 Africa 1967 45.334538
4 Africa 1972 47.450942
5 Africa 1977 49.580423
6 Africa 1982 51.592865
7 Africa 1987 53.344788
8 Africa 1992 53.629577
9 Africa 1997 53.598269
10 Africa 2002 53.325231
11 Africa 2007 54.806038
12 Americas 1952 53.279840
13 Americas 1957 55.960280
14 Americas 1962 58.398760
15 Americas 1967 60.410920
16 Americas 1972 62.394920
17 Americas 1977 64.391560
18 Americas 1982 66.228840
19 Americas 1987 68.090720
20 Americas 1992 69.568360
21 Americas 1997 71.150480
22 Americas 2002 72.422040
23 Americas 2007 73.608120
24 Asia 1952 46.314394
25 Asia 1957 49.318544
26 Asia 1962 51.563223
27 Asia 1967 54.663640
28 Asia 1972 57.319269
29 Asia 1977 59.610556
30 Asia 1982 62.617939
31 Asia 1987 64.851182
32 Asia 1992 66.537212
33 Asia 1997 68.020515
34 Asia 2002 69.233879
35 Asia 2007 70.728485
36 Europe 1952 64.408500
37 Europe 1957 66.703067
38 Europe 1962 68.539233
39 Europe 1967 69.737600
40 Europe 1972 70.775033
41 Europe 1977 71.937767
42 Europe 1982 72.806400
43 Europe 1987 73.642167
44 Europe 1992 74.440100
45 Europe 1997 75.505167
46 Europe 2002 76.700600
47 Europe 2007 77.648600
48 Oceania 1952 69.255000
49 Oceania 1957 70.295000
50 Oceania 1962 71.085000
51 Oceania 1967 71.310000
52 Oceania 1972 71.910000
53 Oceania 1977 72.855000
54 Oceania 1982 74.290000
55 Oceania 1987 75.320000
56 Oceania 1992 76.945000
57 Oceania 1997 78.190000
58 Oceania 2002 79.740000
59 Oceania 2007 80.719500
Now we have just 5 lines to plot, but still can get an overview of our data.
# Set up the arguments for the loop
continents = continent_year_life_exp["continent"].unique().tolist() # Create a list with each unique value in continent column
continents.sort() # Sort this list in alphabetical order
my_palette = ["#342040", "#865CA1", "#66B2C4" ,"#98C8D4", "#A9E8A6"] # Create a list of colours I want
# zips together our continent names and pallete colours into one list of tuples
continent_palette = zip(continents, my_palette)
# Create our figure and our axes
figure, axes = plt.subplots(figsize=(6, 5))
# Loop through each continent and each colour in turn
for continent_name, colour in continent_palette: # New now has for 3 things not 2
# Get only the data for the continent we are looking at
continent_rows = continent_year_life_exp[continent_year_life_exp["continent"] == continent_name]
axes.plot(continent_rows["year"],
continent_rows["life_exp"],
c=colour, # Use corresponding colour from continent_palette
label=continent_name) # Gives each continent a label for our legend
# Add Labels, Title and Captions
plt.suptitle("Graph showing Life Expectancy by year",
fontname="Arial", size=16, x = 0.43, y = 1.05)
plt.title("Mean life expectancy per continent" ,
fontname="Arial", size=12, loc="left", y = 1.1, x = -0.025)
figure.text(x=0.65, y=-0.01, s="Source: Gapminder", ha="left",
fontname="Arial", size=12)
axes.set_xlabel("Year", fontname="Arial", size=12)
axes.set_ylabel("Life expectancy at birth in years", fontname="Arial", size=12)
# Set the Y axis lower limit to 0
axes.set_ylim(bottom=0, top = 85)
# Gridlinesaxes.grid(b = True , which = "both", axis = "x", color = "white") # Without this both grid lines are visible (bug?)
#axes.grid(b = True , which = "both", axis = "y", color = (0.745, 0.745, 0.745))
# Customise and show the Legendplt.legend(bbox_to_anchor=(1.05, 1),
loc=2,
borderaxespad=0,
frameon=False, # Turn off border around legend
prop={"family": "Arial", "size":12})
plt.show();We can also use our multiplot method here:
# NEW - # Set up the arguments for the loop
continents = continent_year_life_exp["continent"].unique().tolist() # Create a list with each unique value in continent column
continents.sort() # Sort this list in alphabetical order
my_palette = ["#335C67" ,"#F5D000", "#E09F3E", "#CC6163", "#540B0E"] # Create a list of colours I want
# Create our figure and our axes
figure, axes = plt.subplots(nrows= 3,
ncols=2, figsize=(15, 15))
figure.subplots_adjust(hspace=0.5)
# zips together our axes, continent names and pallete colours into one list of tuples
continent_palette = zip(axes.flatten(), continents, my_palette)
# Loop through each continent and each colour in turn
for each_ax, continent_name, colour in continent_palette:
# Get only the data for the continent we are looking at
continent_rows = continent_year_life_exp[continent_year_life_exp["continent"] == continent_name]
each_ax.plot(continent_rows["year"],
continent_rows["life_exp"],
c=colour)# Use corresponding colour from continent_palette
each_ax.set_title(label = continent_name, loc = "left")
each_ax.set_xlabel("Year", fontname="Arial", size=12)
each_ax.set_ylabel("Life expectancy at birth in years", fontname="Arial", size=12)
each_ax.grid(b = True, which = "both", axis = "x", color = "white") # Without this both grid lines are visible (bug?)
each_ax.grid(b = True, which = "both", axis = "y", color = (0.745, 0.745, 0.745))
each_ax.set_frame_on(False)
each_ax.set_axisbelow(True)
# Set the x and y lims to be the same for each vis - or we loose scale.
each_ax.set_ylim(bottom=0, top = ( continent_year_life_exp["life_exp"].max() + (continent_year_life_exp["life_exp"].max() / 10) ))
# Add Labels, Title and Captionsplt.suptitle("Graph showing Life Expectancy by year",
fontname="Arial", size=16, y = 0.93, x = 0.25)
figure.text(x=0.12, y= 0.9, s="Mean life expectancy per continent", ha="left",
fontname="Arial", size=14)
figure.text(x=0.8, y= 0.1, s="Source: Gapminder", ha="left",
fontname="Arial", size=12)
# Remove the empty plot at the bottom right
figure.delaxes(axes[2][1]) # Rows and Cols - starts at 0
plt.show();This code allows us to create a chart for multiple distributions. This allows us to plot all of the lines, highlighting one line (in this case continent) per plot.
Here we’ve looped using the enumerate method, which is useful as it also gives an index to each iteration of the loop.
With enumerate we need to loop over the same amount of items as subplot axes we’re creating (here 6). That’s why we’ve needed to add an empty value to the continents list, and to our my_pallete and greys lists.
# Set Up
continents = continent_year_life_exp["continent"].unique().tolist()
# Create a list with each unique value in continent column
continents.sort() # Sort this list in alphabetical order
continents.append(" ")
# We have to plot 6 subplots, even though we have 5 data sets - this adds a blank ``` at the end.
# This will be the colour of the highlighted continetn
bold_highlight = "#203C89"
# Create our figure and our axes
figure, axes = plt.subplots(nrows= 3, ncols=2, figsize=(15, 15)) # Create a grid that's 3 x 2 (6 subplots)
figure.subplots_adjust(hspace=0.5)
# Loop through each subplot axes in turn
for index, each_ax in enumerate(axes.flatten()):
greys = ["#999999", "#999999", "#999999", "#999999", "#999999", "#999999"]
# This needs to match the number of axes; even though there's no data on the last one
greys[index] = bold_highlight
# Matches our grey colours with our my_pallete colours
continent_palette = zip(continents, greys)
# Loop through each continent and each colour in turn
for continent_name, colour in continent_palette: # New now has for 3 things not 2
# Get only the data for the continent we are looking at
continent_rows = continent_year_life_exp[continent_year_life_exp["continent"] == continent_name]
each_ax.plot(continent_rows["year"],
continent_rows["life_exp"],
c=colour) # Use corresponding colour from continent_palette
# Set our titles, labels, grid, ylimits etc
each_ax.set_title( label = continents[index], loc = "left")
each_ax.set_xlabel("Year", fontname="Arial", size=12)
each_ax.set_ylabel("Life expectancy at birth in years", fontname="Arial", size=12)
each_ax.grid(b = True, which = "both", axis = "x", color = "white") # Without this both grid lines are visible (bug?)
each_ax.grid(b = True, which = "both", axis = "y", color = (0.745, 0.745, 0.745))
each_ax.set_frame_on(False)
each_ax.set_axisbelow(True)
# Set the x and y lims to be the same for each vis - or we loose scale.
each_ax.set_ylim(bottom=0, top = continent_year_life_exp["life_exp"].max()
+ (continent_year_life_exp["life_exp"].max() / 10))
# Add Labels, Title and Captionsplt.suptitle("Graph showing Life Expectancy by year",
fontname="Arial", size=16, y = 0.93, x = 0.25)
figure.text(x=0.12, y= 0.9, s="Mean life expectancy per continent", ha="left",
fontname="Arial", size=14)
figure.text(x=0.8, y= 0.1, s="Source: Gapminder", ha="left",
fontname="Arial", size=12)
# Remove the empty plot at the bottom right
figure.delaxes(axes[2][1]) # Rows and Cols - starts at 0
plt.show();We can also plot multiple lines on one plot in Seaborn; using similar parameters as the multi plot we did for scatter.
Here we have a line for each continent on the same visualisation.
plot = sns.relplot(x = "year", y = "life_exp", data= continent_year_life_exp,
kind = "line",
hue = "continent")
# Removes the "spines" or edges of our vis
sns.despine(left=True, bottom=True)
# Add Labels, Title and Captions
plt.suptitle("Graph showing Life Expectancy by year",
fontname="Arial", size=16, y = 1.06, x = 0.35)
plt.title("Data from Gapminder Dataset" ,
fontname="Arial", size=12, loc="left", y = 1.01, x = -0.13)
plt.text(y = -12, x = 1989, s = "Source: Gapminder", ha="left",
fontname="Arial", size=12)
#Set the X and Y axis Labels
# Note the methods are set_xlabels -with an s!
plot.set_xlabels("Year", fontname="Arial", size=12)
plot.set_ylabels("Life expectancy at birth in years", fontname="Arial", size=12)
# Set gridlines
axes = plot.ax
axes.grid(b = True, which = "both", axis = "x", color = "white") # Without this both grid lines are visible (bug?)
axes.grid(b = True, which = "both", axis = "y", color = (0.745, 0.745, 0.745))
# Set the Y axis lower limit to 0 and upper to 90 to give space.
plt.ylim(0, 90); # Set our Y axis to start from 0 or it "floats")We can also multiplots in Seaborn; using similar parameters as the multi plot we did for scatter.
plot = (sns.relplot(x = "year", y = "life_exp", data= continent_year_life_exp,
kind = "line",
col = "continent",
col_wrap = 2)
.set_titles("") # Without this we also end up with a central title!
.set_titles("{col_name}", loc = "left") ) # Without this the titles are "continent = Asia" etc
# Removes the "spines" or edges of our vis
sns.despine(left=True, bottom=True)
# Add Labels, Title and Captions
plt.suptitle("Graph showing Life Expectancy by year",
fontname="Arial", size=16, y = 1.02, x = 0.21)
plt.text(x = 1942, y = 291, s = "For each continent", fontname="Arial", size=14 )
# When I use "title" here it removes the "Oceania" label
plt.text(x= 2050, y=-18, s="Source: Gapminder", ha="left",
fontname="Arial", size=12)
#Set the X and Y axis Labels
# Note the methods are set_xlabels -with an s!
plot.set_xlabels("Year", fontname="Arial", size=12)
plot.set_ylabels("Life expectancy at birth in years", fontname="Arial", size=12)
# Set gridlines
for index, each_ax in enumerate(plot.axes.flatten()):
each_ax.grid(b = True, which = "both", axis = "x", color = "white") # Without this both grid lines are visible (bug?)
each_ax.grid(b = True, which = "both", axis = "y", color = (0.745, 0.745, 0.745))
# Set the Y axis lower limit to 0 and upper to 90 to give space.
plt.ylim(0, 90); # Set our Y axis to start from 0 or it "floats")6 Discrete X, Continuous Y
6.1 Bar Charts Revisited
A lot of visualisations in this chapter are like the bar chart section from earlier in the chapter.
The main difference is here we are specifying both an X and a Y variable, and previously we only specified a Y variable.
As a result we won’t be covering things like:
- categorical variables
- horizontal bar charts
- colouring bars
- bar spacing
- Setting our X and Y limits
- Setting value labels
In this section again, the methods from earlier will work here too. Please feel free to revisit that section if you need help.
We’ll create some mean life expectancy data, by grouping by continent and finding the mean of the life_exp column.
cont_mean_life_exp = gapminder.groupby(by = "continent", as_index= False)["life_exp"].mean()
cont_mean_life_exp continent life_exp
0 Africa 48.865330
1 Americas 64.658737
2 Asia 60.064903
3 Europe 71.903686
4 Oceania 74.326208
Note here rather than setting height to be the count column, if continuous variable like the life_exp column is used, the bar chart will have a continuous y axis, while retaining the discrete x.
# Create our figure and our axes
figure, axes = plt.subplots(figsize=(6, 5))
# Plot the Data
axes.bar(cont_mean_life_exp.index, # The X axis is the index values
height = cont_mean_life_exp["life_exp"], # NEW
tick_label = cont_mean_life_exp["continent"]) # Set the tick labels to be the continent names, not just numbers.
# Set Gridlines and coloursaxes.grid(b = True, axis = "x", c = "white", which = "major", )axes.grid(b = True, axis = "y", c = (0.745, 0.745, 0.745), which = "major")
axes.set_frame_on(False)
# Add Labels, Title and Captions
plt.suptitle("Mean Life Expectancy per Continent", x=0.3)
plt.title("Data from Gapminder Dataset" ,
fontname="Arial", size=12, x = 0.12)
figure.text(x=0.65, y=-0.02, s="Source: Gapminder", ha="left")
axes.set_xlabel("Countries")
axes.set_ylabel("Mean Life Expectancy")
axes.grid(b = True, which = "major", axis = "y", color = (0.745, 0.745, 0.745))
axes.set_frame_on(False)
# Set Tick Colours to the same grey as our gridlines
axes.xaxis.set_tick_params(color=(0.745,0.745,0.745))
axes.yaxis.set_tick_params(color=(0.745,0.745,0.745));
# Create a list of the total values to loop over
country_mean = round(cont_mean_life_exp["life_exp"], 2).tolist() # Rounded here otherwise they're too long
# Loop over the values
for i in range(len(country_mean)):
axes.annotate(country_mean[i], # Text is the value of the loop we're currently on
xy = (i, 1), # At position 1 on the x axis, and the y position of the current loop
ha = "center", # Ofset the text slightly down to be more cental
color = "white") # Set the colour of the text to white
plt.show();Seaborn
Seaborn again, makes it much easier to create this bar chart.
To match the other visualisation I need to sort my gapminder data by continent.
I can simply set my x axis to be the continent column, the y axis to be the life expectancy column (it automatically assumes I want the mean) and my data as the sorted dataframe gapminder_sort_continent.
By default the sns.barplot() has confidence interval bars on; we turn those off by setting ci = False
gapminder_sort_continent = gapminder.groupby("continent")["life_exp"].mean().reset_index()
gapminder_sort_continent.sort_values("continent").reset_index() # Sort bar_plot = sns.barplot(x="continent", y = "life_exp", data=gapminder_sort_continent,
ci = None)
# Removes the "spines" or edges of our vissns.despine(left=True, bottom=True)
# Add Labels, Title and Captions (this comes from Matplotlib!)
plt.suptitle("Mean Life Expectancy per Continent", x = 0.30, fontname="Arial", size=16,)
plt.title("Data from Gapminder Dataset" ,
fontname="Arial", size=12, x = 0.047)
bar_plot.text(x=3, y= -15, s="Source: Gapminder", ha="left")
bar_plot.set_xlabel("Continent")
bar_plot.set_ylabel("Mean Life Expectancy")
# Set Gridlines and colours
bar_plot.grid(b = True , which = "both", axis = "y", color = (0.745, 0.745, 0.745))
# Set Tick Colours to the same grey as our gridlinesbar_plot.xaxis.set_tick_params(color=(0.745,0.745,0.745))
bar_plot.yaxis.set_tick_params(color=(0.745,0.745,0.745))
plt.show();6.1.1 Multiple Bars
While this is possible in matplotlib; it is much simpler to do in Seaborn;.
First we’ll create a new DataFrame that shows the total population per continent for each year from 1992 to 2007 (the last year in our data).
pop_continent_after_92 = (gapminder[gapminder["year"] >= 1992] # Filter for data 1992 onwards
.groupby(["year", "continent"], as_index = False) # Group by year and continent, reseting index
.agg({"pop" : np.sum}) # Sum of the population
)
pop_continent_after_92["pop_millions"] = pop_continent_after_92["pop"] / 1000000
pop_continent_after_92.head() year continent pop pop_millions
0 1992 Africa 6.590815e+08 659.081517
1 1992 Americas 7.392741e+08 739.274104
2 1992 Asia 3.133292e+09 3133.292191
3 1992 Europe 5.581428e+08 558.142797
4 1992 Oceania 2.091965e+07 20.919651
bar_plot = sns.barplot(x="year", y = "pop_millions" , data = pop_continent_after_92,
hue = "continent", # colour the bar according to the continent
ci = None)
# Removes the "spines" or edges of our vissns.despine(left=True, bottom=True)
# Add Labels, Title and Captions (this comes from Matplotlib!)
plt.suptitle("Mean population per Continent", x = 0.22, y = 1.05, fontname="Arial", size=16,)
plt.title("Data from Gapminder Dataset" ,
fontname="Arial", size=12, x = 0.047, y = 1.05)
plt.text(x=2.5, y= -900, s="Source: Gapminder", ha="left")
bar_plot.set_xlabel("Year")
bar_plot.set_ylabel("Mean population \n (millions)")
# Set Gridlines and colours
bar_plot.grid(b = True , which = "both", axis = "y", color = (0.745, 0.745, 0.745))
# Set Tick Colours to the same grey as our gridlinesbar_plot.xaxis.set_tick_params(color=(0.745,0.745,0.745))
bar_plot.yaxis.set_tick_params(color=(0.745,0.745,0.745))
plt.legend(bbox_to_anchor=(1.05, 1), loc=2, borderaxespad=0.) # Moves the legend to a box outside of the plot.
plt.show();Stacked bar charts are again possible in Matplotlib and Seaborn; but our simplest option is to use the .plot() from Pandas as this has the argument stacked = True.
These are commonly used for percentages, or where the bars will add up to the same number.
In this section we’ll be comparing the populations of each continent. These are represented as a percentage of the total for each year.
The plot below uses data from the years 1952, 1972, 1987 and 2002. The differences are not vast; but you can see that Europe’s population as a percentage of the whole has shrunk, and Africa’s has increased.
# Prepare the Data
percentage = (gapminder[gapminder["year"].isin([1952, 1972, 1987, 2002])] # Select the Years
.groupby(["year","continent"])["pop"].sum().rename("count") )
percentage = (percentage * 100)/ percentage.groupby(level=[0]).transform("sum") # Makes the column a percentage
percentage = percentage.reset_index()percentage.pivot("year", "continent", "count").plot(kind="bar" ,stacked = True)
plt.legend(bbox_to_anchor=(1.05, 1), loc=2, borderaxespad=0.); #Moves the legend to a box outside of the plot.;
# Obtain the axes object so we can manipulate it
axes = plt.gca()
# Add Labels, Title and Captions (this comes from Matplotlib!)
plt.suptitle("Population as a % of the years total", x = 0.26, y = 1.05, fontname="Arial", size=16,)
plt.title("Data from Gapminder Dataset" ,
fontname="Arial", size=12, x = 0.047, y = 1.05)
plt.text(x=2.5, y= -30, s="Source: Gapminder", ha="left")
axes.set_xlabel("Year")
axes.set_ylabel("Percentage")
# Set Gridlines and colours
axes.grid(b = True, axis = "x", c = "white", which = "major")axes.grid(b = True, axis = "y", c = (0.745, 0.745, 0.745), which = "major")
axes.set_frame_on(False)
# Set Tick Colours to the same grey as our gridlines
axes.xaxis.set_tick_params(color=(0.745,0.745,0.745))
axes.yaxis.set_tick_params(color=(0.745,0.745,0.745))
plt.legend(bbox_to_anchor=(1.05, 1), loc=2, borderaxespad=0.) # Moves the legend to a box outside of the plot.
plt.show();These are more commonly seen as a horizontal version. Again; they’re not considered very good examples of visualisation and we would advise using them sparingly.
percentage.pivot("year", "continent", "count").plot(kind="barh" ,stacked = True)
plt.legend(bbox_to_anchor=(1.05, 1), loc=2, borderaxespad=0.); #Moves the legend to a box outside of the plot.;
# Obtain the axes object so we can manipulate it
axes = plt.gca()
# Add Labels, Title and Captions (this comes from Matplotlib!)
plt.suptitle("Population as a % of the years total", x = 0.22, y = 1.05, fontname="Arial", size=16,)
plt.title("Data from Gapminder Dataset" ,
fontname="Arial", size=12, x = 0.047, y = 1.05)
plt.text(x=80, y= -1, s="Source: Gapminder", ha="left")
axes.set_ylabel("Year")
axes.set_xlabel("Percentage")
# Set Gridlines and colours
axes.grid(b = True, axis = "x", c = (0.745, 0.745, 0.745), which = "major")
axes.grid(b = True, axis = "y", c = "white", which = "major")
axes.set_frame_on(False)
# Set Tick Colours to the same grey as our gridlines
axes.xaxis.set_tick_params(color=(0.745,0.745,0.745))
axes.yaxis.set_tick_params(color=(0.745,0.745,0.745));
plt.legend(bbox_to_anchor=(1.05, 1), loc=2, borderaxespad=0.) # Moves the legend to a box outside of the plot.
plt.show();6.2 Box Plots
Although not commonly shown in publications we felt it was important to include box plots (and violin plots) as they’re commonly used in exploratory data analysis. As such these are often “less attractive” than previous plots we’ve seen.
# Set up the Data
life_exp = gm_1987[gm_1987["life_exp"].notnull()]["life_exp"]
# Set up the axes and figure
figure, axes = plt.subplots(figsize=(6, 5))
# Plot
axes.boxplot(x = life_exp.values)
# Set Gridlines and coloursaxes.grid(b = True, axis = "x", c = "white", which = "major")axes.grid(b = True, axis = "y", c = (0.745, 0.745, 0.745), which = "major")
axes.set_frame_on(False)
# Add Labels, Title and Captions
plt.suptitle("Life Expectancy", x = 0.12)
plt.title("1987 Data from Gapminder Dataset",
fontname="Arial", size = 12, x = 0.12)
figure.text(x=0.65, y=-0.02, s="Source: Gapminder", ha="left")
axes.set_ylabel("Life Expectancy")
axes.set_xlabel("1987")
axes.grid(b = True, which = "major", axis = "y", color = (0.745, 0.745, 0.745))
axes.set_frame_on(False)
# Set Tick Colours to the same grey as our gridlines
axes.xaxis.set_tick_params(color=(0.745,0.745,0.745))
axes.yaxis.set_tick_params(color=(0.745,0.745,0.745))
plt.show();A Pandas version also exists
figure, axes = plt.subplots(figsize=(6, 5))
gm_1987.boxplot(column = "life_exp")
# Obtain the axes object so we can manipulate it
axes = plt.gca()
# Set Gridlines and colours
axes.grid(b = True, axis = "x", c = "white", which = "major")axes.grid(b = True, axis = "y", c = (0.745, 0.745, 0.745), which = "major")
axes.set_frame_on(False)
# Add Labels, Title and Captions
plt.suptitle("Life Expectancy", x = 0.12)
plt.title("1987 Data from Gapminder Dataset" ,
fontname = "Arial", size=12, x = 0.12)
figure.text(x=0.65, y=-0.02, s="Source: Gapminder", ha="left")
axes.set_ylabel("Life Expectancy")
axes.set_xlabel("1987")
axes.grid(b = True, which = "major", axis = "y", color = (0.745, 0.745, 0.745))
axes.set_frame_on(False)
# Set Tick Colours to the same grey as our gridlines
axes.xaxis.set_tick_params(color=(0.745,0.745,0.745))
axes.yaxis.set_tick_params(color=(0.745,0.745,0.745))
plt.show();And using Seaborn:
# Set up the figure and axes
figure, axes = plt.subplots(figsize=(6,6))
# Plot
box_plot = sns.boxplot(y = "life_exp", data = gm_1987)
# Removes the "spines" or edges of our vis
sns.despine(left=True, bottom=True)
# Add Labels, Title and Captions (this comes from Matplotlib!)
plt.suptitle("Life Expectancy", x = 0.22, y = 1., fontname="Arial", size=16,)
plt.title("1987 Data from Gapminder Dataset" ,
fontname = "Arial", size = 12, x = 0.23, y = 1.05)
plt.text(x=0.2, y= 30, s="Source: Gapminder", ha="left")
box_plot.set_ylabel("Life Expectancy")
box_plot.set_xlabel("1987")
# Set Gridlines and colours
bar_plot.grid(b = True , which = "both", axis = "y", color = (0.745, 0.745, 0.745))
# Set Tick Colours to the same grey as our gridlinesbox_plot.xaxis.set_tick_params(color=(0.745,0.745,0.745))
box_plot.yaxis.set_tick_params(color=(0.745,0.745,0.745))
plt.show();6.2.1 Colours
Customisation is quite limited; again this is often an exploratory visualisation rather than explanatory.
This code also works for the Seaborn and the Pandas plots.
# Create the figure an axes
figure, axes = plt.subplots(figsize=(6, 5))
#Plot the visualisation
axes.boxplot(x = life_exp.values,
boxprops = dict(linestyle="-", linewidth=2, color="b"), # NEW, customise the box
medianprops = dict(linestyle="-", linewidth=2, color="r")) # NEW customise the median
# Set Gridlines and coloursaxes.grid(b = True, axis = "x", c = "white", which = "major")axes.grid(b = True, axis = "y", c = (0.745, 0.745, 0.745), which = "major")
axes.set_frame_on(False)
# Add Labels, Title and Captions
plt.suptitle("Life Expectancy", x = 0.12)
plt.title("1987 Data from Gapminder Dataset" ,
fontname = "Arial", size = 12, x = 0.12)
figure.text(x=0.65, y=-0.02, s="Source: Gapminder", ha="left")
axes.set_ylabel("Life Expectancy")
axes.set_xlabel("1987")
axes.grid(b = True, which = "major", axis = "y", color = (0.745, 0.745, 0.745))
axes.set_frame_on(False)
# Set Tick Colours to the same grey as our gridlines
axes.xaxis.set_tick_params(color=(0.745,0.745,0.745))
axes.yaxis.set_tick_params(color=(0.745,0.745,0.745))
plt.show();6.2.2 Multiple Boxes
For multiple box plots we can do a groupby() and use .get_group() to access and plot each group.
This is a less complicated way of plotting a multiple box plot than looping over the boxes; although that is still a valid approach.
# Set up the figure and axis
figure, axes = plt.subplots(figsize=(6, 5))
# remove na values
life_exp = gm_1987[gm_1987["life_exp"].notnull()]
#Plot the visualisation
groups = life_exp.groupby("continent")["life_exp"]
axes.boxplot([groups.get_group("Africa"),
groups.get_group("Americas"),
groups.get_group("Asia"),
groups.get_group("Europe"),
groups.get_group("Oceania")],
labels = ["Africa", "America", "Asia", "Europe", "Oceania"],
medianprops = dict(linestyle="-", linewidth=2, color="r"))
# Set Gridlines and coloursaxes.grid(b = True, axis = "x", c = "white", which = "major")axes.grid(b = True, axis = "y", c = (0.745, 0.745, 0.745), which = "major")
axes.set_frame_on(False)
# Add Labels, Title and Captions
plt.suptitle("Life Expectancy", x = 0.12)
plt.title("1987 Data from Gapminder Dataset" ,
fontname = "Arial", size = 12, x = 0.12)
figure.text(x=0.65, y=-0.02, s="Source: Gapminder", ha="left")
axes.set_ylabel("Life Expectancy")
axes.set_xlabel("countries")
axes.grid(b = True, which = "major", axis = "y", color = (0.745, 0.745, 0.745))
axes.set_frame_on(False)
# Set Tick Colours to the same grey as our gridlines
axes.xaxis.set_tick_params(color=(0.745,0.745,0.745))
axes.yaxis.set_tick_params(color=(0.745,0.745,0.745))
plt.show();We can also do this plot easily in Seaborn by setting hue = “continent”
# Set up the figure and axes
figure, axes = plt.subplots(figsize=(6,6 ))
# Plot
box_plot = sns.boxplot(x = "continent",
y = "life_exp",
hue = "continent",
data = gm_1987)
# Removes the "spines" or edges of our vis
sns.despine(left=True, bottom=True)
# Add Labels, Title and Captions (this comes from Matplotlib!)
plt.suptitle("Life Expectancy", x = 0.22, y = 1., fontname="Arial", size=16,)
plt.title("1987 Data from Gapminder Dataset" ,
fontname = "Arial", size = 12, x = 0.23, y = 1.05)
plt.text(x=3, y= 30, s="Source: Gapminder", ha="left")
box_plot.set_ylabel("Life Expectancy (years)")
box_plot.set_xlabel("Continent")
# Set Gridlines and colours
bar_plot.grid(b = True , which = "both", axis = "y", color = (0.745, 0.745, 0.745))
# Set Tick Colours to the same grey as our gridlinesbox_plot.xaxis.set_tick_params(color=(0.745,0.745,0.745))
box_plot.yaxis.set_tick_params(color=(0.745,0.745,0.745))
plt.legend(bbox_to_anchor=(1.05, 1), loc=2, borderaxespad=0.) # Moves the legend to a box outside of the plot.
plt.show();7 End of Chapter
You have completed chapter 4 of the Data Visualisation course. Please move on to chapter 5.